What about Inputting Policy in Value Function: Policy Representation and Policy-Extended Value Function Approximator
نویسندگان
چکیده
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such extension enables PeVFA preserve values of multiple policies at same time and brings appealing characteristic, i.e., generalization among policies. formally analyze under Generalized Policy Iteration (GPI). From theoretical empirical lens, we show that generalized estimates offered by may have lower initial approximation error true successive policies, is expected improve consecutive during GPI. Based on above clues, introduce a new form GPI with leverages along improvement path. Moreover, propose representation learning framework for RL policy, providing several approaches learn effective embeddings from network parameters or state-action pairs. In our experiments, evaluate efficacy OpenAI Gym continuous control tasks. For representative instance algorithm implementation, Proximal Optimization (PPO) re-implemented paradigm achieves about 40% performance its vanilla counterpart most environments.
منابع مشابه
Value Function Approximation and Policy Performance
Fig. 1 gives a geometric interpretation of value function approximation. We may think of J � as a vector in ∗; by considering approximations of the form J̃ = �r, we restrict attention to the hyperplane J = �r in the same space. Given a norm ≤ · ≤ (e.g., the Euclidean norm), an ideal value function approximation algorithm would choose r minimizing ≤J −�r≤; in other words, it would find the projec...
متن کاملValue-Oriented Policy Taking and Contextual Architecture in Historical Context
Today, the field-oriented architecture of the historical context of the most important topics in the field of architecture Because of the inherent value and practical concepts and is directly related to the knowledge, awareness and decision-making at the individual or individuals. No matter how much knowledge is more valuable than the deeper, more complete and more accurate will be extracted ...
متن کاملorigins of armenia’s foreign policy and its foreign policy towards iran
foreign policy takes root from complicated matters. however, this issue may be more truth about armenia. although the new government of armenia is less than 20 years, people of this territory are the first ones who officially accepted christianity. in very past times, these people were a part of great emperors like iran, rome, and byzantium.armenia is regarded as a nation with a privileged hist...
15 صفحه اولPolicy Gradient With Value Function Approximation For Collective Multiagent Planning
Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contrib...
متن کاملCharacterizing a Brain-Based Value-Function Approximator
The field of Reinforcement Learning (RL) in machine learning relates significantly to the domains of classical and instrumental conditioning in psychology, which give an understanding of biology’s approach to RL. In recent years, there has been a thrust to correlate some machine learning RL algorithms with brain structure and function, a benefit to both fields. Our focus has been on one such st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i8.20820